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ABSTRACT 


Many  measurement  problems  can  be  formulated  as 
follows:  First,  a  certain  linear  relationship 
between  two  variables  is  to  be  estimated  by  using 
pairs  of  input  and  output  data;  thereafter,  the 
value  of  an  unknown  input  variable  is  to  be  esti¬ 
mated  given  an  observation  of  the  corresponding 
output  variable.  This  problem  is  often  referred 
to  as  inverse  regression  or  discrimination. 

In  this  paper  first  non-Bayesian  approaches  to 
the  problem,  thereafter  the  Bayesian  approach  by 
Hoadley  are  presented.  Third,  a  Bayesian  approach 
by  Avenhaus  and  Jewell  is  discussed  which  uses  the 
ideas  of  credibility  theory.  Finally,  a  new 
Bayesian  approach  is  presented.  The  advantages 
and  disadvantages  of  the  various  approaches  are 
put  together- 


!  V 


FORMULATION  OF  THE  PROBLEM 


The  relationship  between  an  independent  variable  x  and  a  response 
variable  y  can  often  be  described  by  the  linear  regression  model 

y^  ■  a+B'x^+O'u.  ,  i«l, — ,n  , 

where  the  u.  are  independently  and  identically  distributed  random  variables 

with  means  zero  and  variances  one.  Usually  the  u.  are  assumed  to  be  normal- 

1 

ly  distributed,  i.e., 

I  C  t'2 

p(u.st)  »  f  exp( - — )dt'  ,  i»l . n  . 

The  problem  is  to  estimate  the  unknown  parameters  a,  6  and  o. 

The  inverse  linear  regression  problem  is  an  extension  of  the  above: 

here,  in  addition  to  the  responses  corresponding  to  the  n  known  independent 

x.,  there  are  m  further  responses  ,  corresponding  to  a  single  un- 

i  i  m 

known  x.  The  model  is 

y.  -  a+S’x^+O’Uj;  i»l,...,n 

z ^  -  a+6 • x  +T'Vj 

where  and  v.  are  independently  and  identically  distributed  random  vari¬ 
ables  with  means  zero  and  variances  one.  The  problem  is  to  make  inferences 
about  x. 

Four  examples  of  this  class  of  problem  are  given  below. 


Nuclear  materials,  e.g.  plutonium,  are  extremely  difficult  to 
measure  directly  by  chemical  means.  Therefore,  one  uses  indirect 
methods,  based  upon  the  heat  production  or  the  number  of  neutrons 
emitted,  in  order  to  estimate  the  amount  of  material  present.  From 
well-known  physical  laws,  we  have  a  general  relationship  between 
these  variables,  but  any  measurement  instrument  based  on  these 
principles  needs  first  to  be  calibrated.  Usually,  this  calibration 
can  be  done  with  the  aid  of  standard  inputs,  containing  known 
amounts  of  nuclear  materials.  However,  these  inputs  (x.)  are  not 


generally  under  our  control,  and  in  some  cases,  may  have  residual  im- 
precisions  in  their  values. 

Measurement  instruments  often  have  longer-term  drifts,  during 
which  they  tend  to  loose  their  original  calibration.  For  this  reason, 
measurement  of  a  given  production  run  often  consists  of  two  distinct 
phases:  (re)calibration  of  the  instrument,  and  actual  indirect  meas¬ 
urement.  With  a  fixed  amount  of  time  available,  it  is  of  interest  to 
determine  how  much  time  should  be  spent  on  the  two  phases,  assuming 
that  additional  time  spent  on  each  observation  reduces  observational 
error. 

Eatination  of  For?iiy  Inoones  by  Foiling 

We  wish  to  estimate,  through  a  public  opinion  poll,  the  distribu¬ 
tion  of  family  incomes  in  a  certain  city  district.  As  the  major  part  of 
the  population  will  not  be  willing  to  divulge  their  incomes,  or  will 
give  only  a  very  imprecise  figure,  we  look  for  a  dependent  variable 
which  can  be  more  easily  determined.  According  to  the  literature  (see, 
e.g.  Muth  (i960)),  housing  expenses  are  strongly  related  to  family  in¬ 
come,  and,  furthermore,  it  may  be  assumed  chat  the  population  is  less 
reluctant  to  divulge  this  figure,  even  though  they  may  not  be  able  to 
do  so  precisely.  Clearly,  to  determine  this  relationship  exactly,  we 
must  have  some  families  in  chis  district  who  are  willing  to  give  both 
their  total  income  and  their  household  expenses.  On  the  ocher  hand,  we 
have  strong  prior  information  on  this  relationship  from  similar  sur¬ 
veys,  and  may  have  general  information  on  income  distribution  from 
census  and  ocher  sources. 

.i j 3 i no  sioioi It  J  IK  03 .1  Zft  rOZl'CScion 

In  a  paper  with  this  title.  Press  and  Scott  (19'i)  consider  a 
simple  linear  regression  problem  in  which  certain  of  the  independent 
variables,  x^,  are  assumed  to  be  missing  in  a  nonsystemat ic  way  from 
the  data  pairs  (x^,y^).  Then  under  special  assumptions  about  the  error 
and  prior  distributions,  they  show  that  an  optimal  procedure  for  esti- 


macing  the  linear  parameters  is  to  first  estimate  the  missing  x^  from 
an  inverse  regression  based  only  on  the  complete  data  pairs. 


Sioassay 

Using  the  methods  of  bioassay  the  effect  of  substances  given  in 
several  dosages  on  organisms  is  investigated.  A  problem  of  inverse  re¬ 
gression  arises  if  first  parameters  of  dosage  response  curves  have  to 
be  estimated  by  evaluation  of  observations  and  later  on  an  indirect  as¬ 
say  is  carried  out  to  determine  the  dosage  necessary  for  interesting 
effect  (see,  e.g.,  Rasch,  Enderlein,  Herrendorfer  (1973)). 

Problems  of  this  kind  are  described  in  textbooks  on  the  theory  of  meas¬ 
urements  and  are  sometimes  called  discrimination  problems  (Brownlee  (1965), 
Miller  (1966)).  They  differ  from  the  subject  of  'Stochastic  Approximation' 

(see,  e.g.,  Wasan  (1969))  in  that  Che  regression  function  is  assumed  to  be 
linear.  Stochastic  approximation  only  requires  some  monotony,  but  this  ad¬ 
vantage  is  invalidated  by  the  superiority  of  standard  methods  to  the  sto¬ 
chastic  approximation  method  in  the  case  of  linear  regression  functions.  There¬ 
fore  the  procedures  of  stochastic  approximation  will  not  be  examined  in  this 
report. 

In  the  following  first  the  non-Bayesian  approaches  to  the  inverse 
linear  regression  problem  are  presented,  especially  the  difficulty  of  the 
infinite  variances  of  all  the  estimates  is  outlined.  Thereafter,  the 
Bayesian  approach  by  Avenhaus  and  Jewell  (1975)  is  discussed  which  uses  the 
ideas  of  credibility  theory  and  which  has  been  written  down  so  far  only  in 
form  of  an  internal  report.  Finally,  a  new  Bayesian  approach  is  presented 
here  for  the  first  time.  In  the  conclusion  the  advantages  and  disadvantages 
of  the  various  approaches  are  put  together.  The  present  situation  may  be 
characterized  in  such  a  way  that  there  are  promising  attempts  but  that  there 
is  not  yet  a  satisfying  solution  to  the  admittedly  difficult  problem. 


> 
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NON- BAYES IAN  APPROACHES 


A  well-known  approach  is  first  to  estimate  a  and  3.  The  maximum  like¬ 
lihood  and  least  squares  estimates  of  a  and  S  based  on  yj,...,yn  are 


l  (x.-x) * (yi~y) 

i»l _ 

a  7 

l  (X.-X)“ 
i-1 


A  -  A  - 

a  •  y  -  b  •  x  , 

where  y  and  x  denote  the  mean  values  of  yj,...,yn  and  of  X| . x  respective¬ 

ly.  This  leads  co  the  'classical'  estimator 

-  A 

a  z-a 
XC"  § 

It  can  be  seen  immediately  that  x_ coincides  with  the  maximum  likelihood 
estimator  for  o"> 0,  t  >0  and  normally  distributed  u^,  v^ :  The  likelihood 
function  of  a,  6,  a,  t  and  x  is  given  by 


L(y1,...,yn,zI,...,zin  3,s.a, T,x) 


( 2 to “ )  • exp ( - -  "  (y.-a-S-x. )“)  . 

2a"  i-I 


•  (2-tt")  'cxp(-  - — -•  l  (z.-t-S’x)")  . 

2r“  j-1  J 

3  L  3  L 

The  partial  derivatives  -r—,  — -  and  —  assumed  to  equal  zero  vield  the  equa- 

3a  aS  -x 


— •  l  (y .  -a-3  •  x. )  -  — - -•  (z.-a-B'x)  *  0 
i"  i  1  1  t"  j  J 


I  j  - 

- T'iCy^a-S’x.l-x^ - -•  >  (z  .-3-6- x)  •  x  =  0 

a"  i  r”  j  J 


—  •  l (z . -a-3 • x) • S  ■  0 

j  J 


tions 
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By  exclusion  of  8»0  one  obtains  ^(z .-a-S*x)=0.  Hence  the  first  two  equa- 

J  A  A 

tions  reduce  to  the  usual  equations  for  the  last  square  estimators  a  and  8. 

The  solution  of  the  third  equation  is  then  given  by  x^. 

One  cannot  judge  this  'classical'  criterion  of  minimizing  the  mean- 

square  deviations,  however,  because  of 

a  ~> , 

E ( (x^-x)  |a,3,o,T,x)  =  +«  , 

and  furthermore,  because  x  has  an  undefined  expectation  value. 

Krutchkoff  (1967)  proposed  the  inverse  estimator  defined  by 

a  a  A  — 

xI  *  y  +  5*z  , 

where 


Z(y£-y)* (xi~x) 


ICy^y)2 


A  -  4- 

Y  *  x  -  0  •  y 

are  the  least  squares  estimators  of  the  slope  and  intercept  when  the  x^'s 
are  formally  regressed  on  the  y^'s.  Although  the  mean  square  error  of  x^ 

is  finite,  Williams  (1969)  doubted  the  relevance  of  x  .  He  showed  that  if 

a  2  L  . 

o-(»T  )  and  the  sign  of  S  are  known,  then  the  unique  umbiased  estimator  of 

x  has  an  infinite  variance.  This  result  led  him  to  the  conclusion  that  since 

any  estimator  that  could  be  derived  in  a  theoretically  justifiable  manner 

would  have  an  infinite  variance  the  fact  that  Kruttchkof f ' s  estinator  had  a 

finite  variance  seemed  to  be  of  little  account. 

Williams  suggested  to  use  confidence  limits  which  should  provide  what 
is  required  for  inverse  linear  regression.  Hence  the  two  papers  of  Perng 
and  Tong  (1974  and  1977)  could  meet  his  approval.  They  treated  the  problem 
of  the  allocation  of  n  and  m  for  the  interval  estimation  of  x  so  that  the 
probability  of  coverage  is  maximized  when  the  total  number  of  observations 
n+m  is  fixed  and  is  large. 


6 


An  independent  discussion  of  the  inverse  linear  regression  problem 
was  given  by  Hoadley  (1970)  for  o=t.  Part  of  his  results  will  be  presented 
in  the  following. 


Without  loss  of  generality  it  is  assumed 

7x.  -  0  . 

T  1 
1 

The  maximum  likelihood  estimators  of  o  based  on  y  alone, 
y  and  z  are 


v 


I 


z  alone  and  both 


2 


V2  “ 

J 


1 


n-2+m-l 

The  F-statistic,  defined  by 


C(n-2)-v)+(m-I)-v, ]  . 


a  2 
n1  d 


where  d  is  the  maximum  likelihood  estimator  of  B,  is  often  used  for  testing 
the  hypothesis  S=*0,  as  in  fact  under  this  hypothesis  F  is  F-distributed 
with  1  and  n+m  degrees  of  freedom. 

In  case  of  m*l  a  confidence  set  S  is  derived  form  the  fact  that 


■(Vx)'t/ - 2 - 7 

y  v- (n+ 1 +x“ 


has  a  t-distribution  with  n-2  degrees  of  freedom.  If  F  is  the  upper  a 

<;l,v 

point  of  the  F-distribution  with  I  and  v  degrees  of  freedom,  one  gets 


S  = 


<x: 

if  F>F  ,  . 

a ; 1 , n-2 

*x:  xCx^’jjfx:  xix^} 

.  ,  n+ )  „  „ 

l£  ,  A 2  -ra:.,n-2^ 

n+1+xC 

1  ,n-2 

-r  ...  n+1 

if  F<  ,,  ’f  , 

,  A/  i ; l f  n*. 

n+ I *x‘ 


-  7  - 


where  and  x^  are  equal  to 


F*x„  {F  .  ' [ (n+1 ) * (F-F  ,)+F’x“]}2 

u _  +  1  g;l,n-2 _ a ;  1 , n-2 _ C  ' 


(F_Fa; I .n-15 


F-F 


a; 1 ,n-2 


with  A  graphical  display  of  S  is  given  in  Figure  1  for  n=9,  x= 

As  we  see,  this  confidence  set  is  not  very  helpful  if 


tv  n*l  - 

n+1+$2  r a; 1 , n-2 


In  this  case  6  is  not  significantly  different  from  zero,  which  may  tempt 
one  to  conclude  that  the  data  provide  no  information  about  x. 


so 


Figure  I :  Comparison  of  95  2  Confidence  Set  and  95  ”  Shortest 
Posterior  Interval  (SPI)  for  n=9,  x=l  (after  Hoadlev 
( 1970)) . 


THE  BAYESIAN  APPROACH  BY  HOADLEY 

The  situation  is  changed  substantially  if  Bayesian  rules  are  admitted. 
Since  Bayesian  rules  are  usually  biased,  the  absence  of  an  unbiased  estima¬ 
tor  for  x  with  finite  variance  does  not  matter.  Furthermore,  whenever  F>0 
a  shortest  posterior  interval  can  be  obtained  from  the  posterior  distribu¬ 
tion  of  x  after  the  observation  of  y,...y  and  z,...z  . 

1  n  1  n 

For  the  sake  of  completeness  some  properties  of  Bayesian  rules  will  be 
derived  as  given,  e.g. ,  by  Ferguson  (1967).  Let  0e0  denote  the  state  chosen 
by  nature.  Given  the  prior  distribution  ‘Ji  on  0,  we  want  to  choose  a  non- 
randomized  decision  rule  d  that  minimizes  the  Bayesian  risk 

r(m.d)  :«  /C/e(9,d(k))dFK(k!e)]d#(9)  , 

where  £(.)  denotes  the  loss  function  and  FK ( . ! 8 )  the  distribution  function 
conditional  on  the  chosen  9.  A  choice  of  9  by  the  distribution  1 1>,  followed 
by  a  choice  of  the  observation  K  from  the  distribution  F^C.'s)  determines  in 
general  a  joint  distribution  of  9  and  K,  which  in  turn  can  be  determined  in 
general  by  first  choosing  K  according  to  its  marginal  distribution 

FR(k)  =  JFK(k'9)dM9) 

and  then  choosing  9  according  to  the  conditional  distribution  of  9,  given 
K=k,  ii(.|k).  Hence  by  a  change  in  the  order  of  integration  we  may  write 

r(’l>,d)  »  JC/t(e,d(k))d'>(9!k)]dFK(k)  . 

Given  that  these  operations  are  admitted,  it  is  easy  now  to  describe  a 
Bayesian  decision  rule.  To  find  a  function  d(.)  that  minimizes  the  last 
double  integral,  we  may  minimize  the  inside  integral  separately  for  each 
k;  that  is,  we  may  find  for  each  k  the  decision,  call  it  d(k),  that  mini¬ 
mizes 

Jl(8,d(k))dM9'k)  , 

i.e.,  the  Bayesian  decision  rule  minimizes  the  posterior  conditional  ex- 
pacted  loss,  given  the  observation. 

In  the  case  of  the  inverse  linear  regression  problem  let  p(9)  and 
p(9  data)  denote  the  prior  and  posterior  density  of  the  unknown  parameter  9, 


respectively .  It  is  assumed  that  (a,3,tn  a)  has  a  uniform  distribution,  i.e 


p(ot,S,j~)  *  .  *) 

a" 

The  most  important  results  of  Hoadley  are  given  in  form  ot  the  following 
two  Theorems. 


Theoren  1 

-> 

Suppose  that,  a  priori,  x  is  independent  of  (i,3,o“),  and  that  the 

a 

prior  distribution  of  (a, 3, a")  is  specified  by 

,  .  2.  1 

p(a,6,a  )  «  . 

a 

Then  the  posterior  density  of  x  is  given  by 

P(x!yi,...,yn,  Zj . zB>  -  p(x) 'L(x)  , 


where 


L(x) 


n  2 
( l+  ~  +x  ) 
m 


m+n-3 


m+n-2 


i*  £ 


a2 


and  where 


R 


F 

F+m+n-3 


The  function  L(.)  is  a  kind  of  likelihood  function  representing  the 
information  about  x  obtained  from  all  sources  except  for  the  prior  distri¬ 
bution  of  x.  As  it  turns  out  L(.)  has  a  lot  of  unpleasant  properties.  It 
seems  that  a  proper  prior  for  x  is  a  prerequisite  to  sensible  use  of  the 
Bayesian  solution  in  the  preceding  theorem. 

In  the  case  m*!  the  inverse  estimator  x^  can  be  characterized  by  the 
following 


In  Bayesian  inference,  the  notation  u*v  indicates  that  the  function  u 
is  up  to  a  proportional  factor  equal  to  v. 


X 


-II- 


Hieoren  2 

If,  a  priori, 


where  the  random  variable  t  ,  has  a  t-distribution  with  n-3  degrees  of  free- 

n-j 

dom,  then,  a  posteriori,  x  conditional  on  yj,...,y  ,  Zj,...,zm  has  the  same 
distribution  as 


A 


+ 


where  t  ^  has  a  t-distribution  with  n-2  degrees  of  freedom. 


This  Theorem  provides  a  better  understanding  of  the  inverse  estimator 
Xj  as  well  as  of  Bayesian  estimators  in  general.  It  seems  that  this  result 
has  not  yet  been  extended  to  a  broader  class  of  informative  priors  due  to 
technical  difficulties.  The  papers  by  Halperin  (1970),  Kalotay  (1971)  and 
Martinelle  (1970)  treat  other  aspects  and  do  not  extend  the  Bayesian  ap¬ 
proach. 

The  following  two  approaches  start  from  a  Bayesian  point  of  view,  too. 
By  restriction  of  the  class  of  admitted  estimators  they  need  only  the  know¬ 
ledge  of  some  moments  instead  of  the  whole  a  priori  distribution  of  a,  S,  a, 
r,  and  x. 


THE  TWO- STAGE  LINEAR  BAYESIAN  APPROACH  ACCORDING  TO 
AVENHAUS  AND  JEWELL  ( 1975) 


With  the  help  of  this  approach  the  problem  is  solved  in  two  stages.1) 

At  the  first  stage  estimators  for  a  and  8,  which  are  linear  in 

y( . yn,  are  constructed  in  such  a  way  that  a  quadratic  loss  function  is 

minimized.  At  the  second  stage  an  estimator  for  x,  which  is  linear  in  the 
only  observation  z.  is  constructed  in  such  a  way  that  a  second  quadratic 

.  1  7 

funtional  is  minimized.  Since  the  apriori  expected  value  of  the  variance  a“ 
is  not  updated,  only  the  apriori  first  and  second  moments  of  a,  8,  o,  x,  and 
x  are  needed. 

Generally  the  procedure  may  be  described  as  follows:  Let  9t3  denote 
the  state  chosen  by  nature,  and  let  <Ji  denote  the  prior  distribution  on  3. 
Using  a  quadratic  loss  function 

£(9,d)  »  const. (9-d)- 

for  the  decision  d,  the  posterior  quadratic  loss  E(e(9,d)  K»k)  for  given  ob¬ 
servation  K-k  is  merely  the  second  moment  about  d  of  the  posterior  distribu¬ 
tion  of  0  given  k: 

E(e(6,d)  K*k)  =»  /i(9 ,d)d 1(0  ' k)  =  const.  / (G-d) 2 -dl (e :k)  . 

This  posterior  quadratic  loss  is  minimized  by  taking  d  as  the  mean  of  the 
posterior  distribution  of  9  given  k.  Hence  the  Bayesian  decision  rule  is 

d(k)  =  E (6 ’ K*k )  . 


This  procedure  now  will  be  applied  to  the  3ayesian  version  of  the  in¬ 
verse  linear  regression  problem  which  will  be  presented  once  more  for  the 
sake  of  clarity. 


2m+2n+5  random  variables 


a.B.o.t.x.Uj . un,  y1,...,yn,  . vm>  z . 

are  considered  which  are  defined  on  a  probability  space  (.'...r,P). 

sumed  that  the  random  vectors  (a.S.o.t),  x,  u,,...,u  ,  v.,...,v 

l  n  I  m 

tically  independent  and  that  the  following  equations  hold: 


‘‘m 

It  is  as- 
are  s t ochas- 


In  the  original  paper  by  Avenhaus  and  Jewell 
m«I  was  considered. 


(1975)  only  the  case  o«t  and 
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y.  =*  a  +  i3  *  x .  +  o'u.  ,  i«l,.  ..  ,n 

'  1  1  t 


z.  a  a  +  3‘x  +  t * v .  ,  j=l,...,m  . 

J  J 

It  is  assumed  that  the  first  and  second  moments  of  u.  and  v.  are  known: 

i  J 

E(ik  )  =  E(Vj)  «  0,  E(ut)  =  E(vT)  =  1;  i  =  l,...,n  ,  j  =  l,...,m  . 

In  the  model  of  decision  theory  the  sample  space  is  the  (n+m)-dimen- 
sional  Euclidean  space;  the  statistician  chooses  a  decision  function  d, 

„m+n 

d:  R  — >  R  , 

which  gives  for  each  observation  of  values  of  . . .  ,  z,,...,z  an  esti- 

1  n  1  m 

mate  for  x,  in  such  a  way  that  the  Bayesian  risk  belonging  to  a  loss  func¬ 


tional  i, 


l:  R  x  R  — >  R 


is  to  be  minimized:  Let  <l>  a  be  the  apriori  distribution  of  a,  S,  a  and  x, 

a, S,o ,x 

and  let  P  . ,  .  'a,S,J)  be  the  conditional 

Y1 . V  Z 1 ’ ' ‘ ’ zm 

distribution  of  y|>- z\”'-’zm  S^ven  B,  and  T>  Then  the  Bayesian 
risk,  defined  by 

r(’!',d(. ) )  »  /RC-i’  ,  S'  ,3 ' ,  x'  ,d( .  )  )di’  ,  „  ( ,3’  ,c'  ,x’)  , 

where  R(.)  is  defined  by 
R(a' ,S' ,o' ,x' ;d(.)) 

-  Jt(x',d(s  . s  ,t. , . . . ,t  ))dP  (s  , .  . .  ,  t  V  ,£■' ,x') 

is  to  be  minimized. 

It  has  been  pointed  out  already  that  in  the  case  of  a  quadratic  loss 


function 


!(x,d)  *  const. (x-d)' 


the  solution  of  the  minimization  problem  is 


E(x  y.  »  s.,  z.  »  t.,  i»l,...,m)  . 

l  i  J  J 
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r 


The  first  theorem  of  Hoadley  given  in  the  preceding  chapter  highlights 
the  complexity  of  this  conditional  expectation.  Therefore,  Avenhaus  and 
Jewell  (1975)  use  at  the  fivsz  st age  of  their  approach  the  following  approx¬ 
imate  estimate  for  x,  which  is  extended  here  to  arbitrary  m, 


AJ 


c0(y,... 


+  l  c;(y. . 


The  functions 


c . : 
J 


j — 0 , 1 , . , . , m 


are  determined  in  such  a  way  that  the  mean  square  error  of  x,  with  z^:-) 
and  using  the  definition  of  the  conditional  expectation  given  by 


E(x-  l  c . (y. 
j-0  J  J 


•V-i> 


■  /E((X'^0Cj(yj'--'’yn)'zj)2>l“Sl*-*-*VSn)dPy1 . y/5! . V  ’ 

is  minimized.  This  is  performed  by  first  minimizing  the  conditional  expecta¬ 
tion  of  the  mean  square  error,  given  by 

E((x-jLci(-)'V2:y,*s . v»> 

.  ® 

■  /<r_  l  ci(,)'ci)“dPv  z  z  . . yn=sn)  • 

j  mq  J  J  z  j  t .  •  ♦  ,z^  i  nil  i  nn 

Derivation  with  respect  to  the  c_,  c c  gives 

u  I  m 


ni  ra 

/( r-  l  c  •  • t . )dP  -  E(x)-cn-  l  c . • E (z  .  y  ■ 

j-0  J  J  j=l  J  J 


,S1 . VSn} 


/(r'jLCj'tj),CtdP  '  E(X'ZC  . . Vsn)_co'E(zl  yl*sl . Vsn) 


. vV  - 
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Putting  these  derivations  equal  to  zero,  we  obtain  the  following  necessary  and 

sufficient  conditions  for  the  c.,c, . c  : 

0  1  m 


m 

cQ(s1 , . . . ,sn)  *  £(x)  -  l  C. (s( . sn)*E(z. !y 

j  =  l  J 


. yn  *  Sn) 


l  c  .  (s  ..... , s  ) *cov(z. , z. ’ y  »s  , . . . ,y  *s  )  -  cov(x,z, jy  =s  , . . . ,y  »  s  )  , 


1= I  .... ,m  . 


Actually  it  is  not  necessary  to  consider  this  system  of  m+ I  unknown 

c-,c,,...,c  ,  since  all  rel 
0  1  m 

contained  in  the  mean  value 


,  since  all  relevant  information  of  the  sequence  . . z  is 

O  1  m  l  m 


I 


m 


z  :*  —  l  z.  . 

10  j-1  J 

This  can  be  proven  as  follows:  Let  c ^ (s ( , . . . , s^) ,  j=0,l,  denote  the  mini¬ 
mizing  coefficients  for  the  case  m*J.  If  we  write  z^:z,  then  the  Cj  ,  j*0,l, 
are  given  by 

c0(s|,...,sn)  -  E(x)-E(z:yI-s),...,yn  -  «„) 


C1  (S1 - - sn} 


cov ( x , z  y  j  =s  j , . . .  *y  *®n) 

var(z  y ,»s ..... ,y  *s  ) 
ii  n  n 


Now  it  can  be  verified  easily  that  in  the  general  case  m> 1 
C0(S! . Sn)  VV-’-’V 

Cj(sr-“*Sn)  m =I(SI . V  *  . . . 

solve  the  system  of  equations  given  above.  Hence  it  suffices  to  consider 
z,  which  means  that  the  estimator  can  be  written  as 


AJ  ’  C0(yl’---*V  *  Cl(yl . V’Z  • 


!  6 


Explicitly  the  terms,  which  are  contained  in  this  solution,  are  given  as 

follows  (for  the  sake  of  simplicity  we  write  y^=s^  instead  of  y ( =s | . 

y  “S  ) : 
n  n 

E(z,y^»s^)  »  E(a,y^»s^)  E(3.  y^=s.)  •  E(x) 


cov(x,2  y,«Sj)  »  E(3  y.*s.)'var(x) 


cov(*j .2, jy^-s^)  =  varfZj  y.»s.)-E(T''y.»s.) 


var  (z  |  |  y .  «s^ )  =  var(a;y.«s.)  ♦  2-F.(x)'cov(a,B  y.*s.)  + 


var(x)-  ((E(3  y.=s.)'*var(3  y.=s.))  + 


+  (E(x))“-var(3'y^»s^)+E(T'y^=-s^') 


The  remaining  problem,  which  represents  the  acjc'vi  szzg?  of  this  ap¬ 
proach,  is  to  determine  the  conditional  expectations 

E^a'yi”si)'  E(B.y.-s.),  var(s  jy^s^) ,  cov(a,3  ,  y^s. ) 


var(3|y^-s^)  3nd  E(t"  y.»s.  )  . 


Avenhaus  and  Jewell  do  not  use  the  observations  of  y.  in  order  to  get  a  bet- 
n  a  1 

ter  estimate  for  j“,  instead  they  replace  E(j“  y.=s.l  by  the  apriori  moment 

n  *- 

E(o“).  All  other  terms  are  estimated  by  means  of  linear  estimators  for  a  and 


>  • a  ,  +  7  •  v  . 

B  0  yi 

l»l 


-  .  ,  \ 

2,  !  *  r.  +  '  *  V  . 

B  0  ,  i  yi  * 

1“  I 

in  such  a  way  that  the  expectation  of  the  quadratic  loss  function. 


E((s*vFvyir  *  (r:~vK'yir) 

i  i 


is  minimized  with  respect  to  the  unknown  a.,  a.,  ? ,  and  f..  This  leads  to 

0  i  0  v 


-  17  - 


the  following  system  of  equations 


aQ  -  E(ct)  -  l  a.*(E(a)+E(8*x.)> 
i*I 


80  -  E(B)  -  l  S^fECcO+EfB-x.)) 
i-1 


l  a. 'cov(y. ,y.)  -  cov(a,a+6*x. )  ,  i«l 
j»l  J  J 


, . . .  ,n 


l  6. *cov(y. ,  y. )  -  co v(B,a+$"x.)  ,  i-1 . n  . 

j.|  J  1  J 

It  can  be  shown  (Jewell  1975)  that  the  solution  can  be  written  as 

«o\  \  : E(a>\  ,  « \ 

•V6b:  •  i  wW  +*1$!  ’ 


where 


l2  *  i  0  I  , 


M  «  C-xT-x*  I,-E(  2)  +  C-xT-x 


■  var(a)  cov(a,B) 
cov(a,S)  var(S)  * 


/I  V  ' 

/I  *]  • 


1  x 


2 


1  x 


T  1  T 
(x1 • x)  -xT-  : 
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Now,  E(a|y.»s.)  and  E(8  y.«s.)  are  estimated  by  a„(s,,...,s  )  and  $„(s.,..., 
i  1  i  1  bin  £>  I 

s  )  respectively.  The  second  moments  of  a  and  6,  i.e.  the  covariance  matrix 

var(a'y^”s^)  cov(a, 3 ’ y^«s^ 

cov(a, 3 1 y^*s^)  var(S!y^=s^) 

T  - ) 

is  estimated  by  >1- (x  *x) 

As  already  mentioned,  the  method  does  not  use  an  aposteriori  estimate 
2 

for  a  .  This  might  easily  be  changed  if  one  assumes  that  the  apriori  estimate 
was  derived  from  a  trial  with  a  known  number  N  of  observations  y y...  Also 

O  ^ 

a  multiple  of  observations  of  z  could  be  used  for  the  estimation  of  a "  in  the 
case  of  o"t.  Furthermore  the  problem  has  to  be  reconsidered  whether  or  not  the 
loss  function  for  the  estimation  of  a  and  S  is  appropriate. 
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A  QUADRATIC  BAYES  APPROACH 

This  approach  tries  to  maintain  the  property  of  linear  Bayes  estima¬ 
tors  insofar  as  only  some  moments  have  to  be  known  and  not  an  apriori  dis¬ 
tribution  of  a,  6,  a,  t  and  x. 

The  idea  is  the  following:  Instead  estimating  the  parameters  a,  B  and 
T  of  the  relation 

z.  =  a+S-x+fv.  ,  , 

J  J 

the  parameters  of  the  transformed  relation 


x  ■  y+6-z.+w.  ,  j»l...m  , 

J  J 


where 


a  r  I  T 

v  -  -  g  .  wj“"  ?’vj  > 

are  estimated  by  estimators  which  are  linear  in  y^,  i**l...n  . 
Explicitely  the  estimator  for  x  is  given  by 

xo  "  Y+  -  s\'z\  * 

1  j-I  J  J 


where  the  estimators  y  and 


y  *  d  +  >  d .  *  v . 
T  00  lO  yi 
I=»l 


5.  *  >  d..’y.  ,  , 

J  Oj  ' 

are  determined  in  such  a  way  that  the  Bayes  risk,  belonging  to  the  quadrat¬ 
ic  loss, 

/(-X+Y+  l  §.-z.)2  dP  -  J(-x+  l  J  d..-y.-z.)“  d?  , 
j»|  j  j  i»0  i«0  J 

where  z^y^l,  is  minimized.  The  solution  yields  a  quadrarij 


m  n 

”  t  y  d .  .  *y .  *z .  , 
Q  j=0  i-0  lJ  1  J 
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the  coefficients  of  which  are  the  solution  of  the  following  system  of  equa¬ 
tions  ^ 

m  n 

/(-x+  l  X  44ii*yi*zi^‘yk‘zC  dP  =  °’  k=0 . n:  > 

j»0  i=0  J  J 

which  is  obtained  by  differentiating  the  Bayes  risk  partially  with  regard 
to  the  parameters  dj,^  In  terms  of  the  moments  of  y^  and  zn  this  system  of 
equations  has  the  form 

m  n 

l  I  i  -  *  E  (y  i "  yk"  z  -  -  z  „  >  =  E(x*y  *z  ),  k=0 . n;  , 

j=0  i=0  J  J  - 

which  means  that  only  the  first  four  moments  are  needed. 


* 


F.ole  of  Cbsovvations 


z . 
J 


It  seems  to  be  plausible  chat  each  observation  z  y  j=l,...,m,  should 
have  the  same  importance  for  a  'best'  estimator  of  x.  Therefore  we  replace 
2(  in  the  case  of  m*l  by  the  mean  value 


z 


m 


m 


and  ask  for  the  risk  minimizing  parameters  d.^,  i=0,...,n,  j=0,l,  of  the 
estimators 


n 

d  >  d  .  '  v . 
00  .  -  iO  ’ i 
1=0 


6 


d0fV,dil'yi 

i=l 


for  y  and  5  of 


x  :=  v+6'z+w  . 


Since  the  risk  is  a  convex  and  quadratic  function  of  d..  the  optimal 
are  completely  determined  as  solutions  of 


d.  . 
1J 


l  l  d- .-ECy.'y  •z-'  +  t)  =E(x-y  -zS  ,  k=0,...,n;  .'=0,1  , 

j»0  i»0  1  K  * 
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where  z^:*l .  Since 

E(yi‘yk-Zj'zc)  =  Etyi*yk*z7^  "  '•  (i.O‘E(y;‘yu-T‘)  ,  i . s-=  i . ■ 


where 


we  gee 


'  •  y  i 
1  k 


. .  ,m 


X(j 


jl** 


j=e 


E(yi‘yk-i)  -  E(y.-yk.Zl)  , 


E(y.-yk-;2)  «  E(yi-yk-zp  ~  ^E^-y^O  , 

where 

ECy^t2)  «  E(a-t2)  +  x^»E(S’ t^)  ,  , 

°2  22  o  2^  2  22. 

E(yt-r  )  *  E(a  • t  )  +  2- x^- E(a- 8‘ t")  +  x^‘E(8-,i  )  +  E(o  -t  )  ,  , 


2.  _2X 


E(yi,yk-T')=E(=i“-T")+(xi+xk)-E(a-5-T_)+xi-xk-E(3'-T  )  ,  i,k=! . n  i#k 


k'e  show  that  the  estimator 


I  n 

l  IhsM1 


Q  jio  i=0  lJ 

represents  a  solution  of  our  original  problem.  Let 


di0  di0 


dij  :=  i  'dn  ’  >  i=0 . n  • 


It  is  easily  shown  that  these  terms  solve  the  original  system  of  equations 
l  /d. .•E(y.-yk-z.*z  )  «  E(x-y  -z  )  ; 

j  i  J 


therefore 


A 


m 

y 


j  =0  i=0  1J 


V  d. ,’y.‘z. 
~  '  '  J  J 


l  l  d-;(z)- 


j=0 


i=0  1J 


is  the  risk  minimizing  estimator  for  X. 

It  should  be  noted  that  the  mean  value  estimator  is  not  always  the 
single  solution. 


Unbiasedness 


The  first  equation  (k-l«0)  of  the  system  of  equations  determining  d^ 


m  i 

1  I  d • . *E(y. *z. .)  *  E(x)  , 
j-0  i-0  J  J 


shows  that  the  estimator  is  unbiased  with  regard  to  the  apriori  distribu¬ 
tion. 


One  would  regard  the  estimator  as  trivial  if  dg0=E(x)  and  d^=0  for 
(i , j)iH0,0) ,  i.e.,  if  the  estimator  neglected  the  observations  of 

Z| . z^.  By  inspection  of  the  equations  determining  the  d^,  this  holds 

if  and  only  if 

2 

var (x) • E (a- S+8  ‘x^)  =  0  ,  k=0,...,n  , 

or  equivalently  if 

? 

var(x)  =0  or  E(3“)  =  0  . 

Hense  these  cases  have  to  be  excluded. 


Ccnpuiational  Procedure 

In  the  following  we  consider  only  the  mean  value  z  of  observations 
Zy  j=l,...,m.  Therefore  we  write  z  instead  of  z  for  the  sake  of  simplici¬ 
ty.  For  the  same  reason  we  write  d..  instead  of  d..,  i=0,l,...,n,  j=0,l. 
This  can  be  interpreted  as  the  description  of  the  situation  where  one  has 

I  2  ? 

only  one  observation  z,  with  the  error  — x  instead  of  x  . 

1  m 

In  the  case  that  the  first  four  joint  moments  of  y,,...,y  and  z  can 
easily  be  obtained,  another  system  of  equations  can  be  used  for  the  deter¬ 
mination  of  the  estimator.  With  the  definitions 


A:*  )  d .  • y .  ,  B :  =  7d.y. 

.  -  .  lO  '  l  .  -  .  1 1  7  l 

i= I  1=1 

Che  syscem  of  2'n*2  equations  for  the  coefficients  d^,  d.j,  i=0,. 
has  the  following  form: 

dQ0  ♦  E(A)  ♦  dQ | • E (z)  +  E(B-z)  =  E(x) 


dno‘E(z)  *  E(A-z)  +  dQ1-E(z‘)  +  E(B-z-)  =  E(x-z) 


d00'E(yk}  *  E(A'^>  +  ^.-E(2-yJ  +  E(B-yu-z)  =  E(x-yu) 


k  01  v  7k 


d00'E(yk‘z)  *  E(A-yk*z)  +  d01-E(z‘-yk)  +  E(B-yk*z2)  =  Efx-y^/z),  k=!,...,n 

Solving  the  first  two  equations  for  d_  and  dnt,  we  get 

UO  0 1 

d01  *  var~(zT  'fcov(x*2)-E{(A+B‘z)‘(z“E(z)^ 
d00  *  var(z)  ' Tcovix- z, z)-cov(x, z2)+E( (A»B~ z) • (z~ E(z)-E(z") ) ) ]  . 


Inserting  these  formulae  into  the  remaining  equations  we  get  with  the  map 
f(.,.)  defined  for  each  pair  of  random  variables  U,V  by 

£ (U, V)  : =  var (z) • E(U‘ V)-E(V) • (cov(U- z , z)-cov(U.z') )-E(V- z) ■ cov(l', z) 

the  following  system  of  equations  for  d^,  d.^,  i  =  l,...,n: 

idio'f(yi’yk> *  j  Idii’f(yi***yk)  =  f(x*yk) 

i»i  i»i 

n  n 

l  dio’f^yi,yk'z^  +  £  dil ‘ f ^yi'2,yk*z^  =  £(x>yk’z^’  k=l,...,n  . 

i=l  i=l 

Having  solved  these  equations,  we  can  determine  d^  and  d^  as  follows: 


d0!  *  7IFTIT  '£C0V(x’z)  -  I  di0-cov(y.,z)  *  d  .  •  cov(y  •  z ,  z)  ] 

i=l  i=l 


i 


1  0  r  0 

d00  *  — r"U),Ccov(.x-z,2)-cov(x,2~)  +  y  d.0-(cov(y.-8.2)-eov(y..z-)) 

i»  I 

?  ~> 

-  i  d.  • (cov (y . ‘ z“ , z) -cov(y . " z , z“)  ) ]  , 
i-I  1 

where  the  moments  needed  explicitely  are  given  by 

E (z)  »  E (a)+E (x) • E (8) 

E(yk-z)  -  E(a“)+rx^+E(x)].E(a-S)+xk-E(x)-E(S2) 

E (y£ * yk)  ”  E(at)'*-[xi-t-xk]-E(a-3)+xi'xk-E(82)+x(i.k)-E(o“) 

E(y^* yk‘ z)  «  E(a3)+[x. +xk+E(x) ] ‘ E (a“ • 3) +[x. • x^+x . ■ E(x)+xk-E(x)]'E(a‘3“)+ 

•*-x.-xk-E(x)-E(33)+x(i.k)-CE(a-c2)+E(x)-E(8-o2)] 

E(z2)  -  E(a2)>2-E(x)-E(a-6)+E(x2)-E(S2)+  --E(t2) 

m 

E(yk' z2)  -  ^•E(a-T2)+xk-^-E(S-T2)*E(13)+[2-E(x)*xk]-E(a2-S)+ 
+[2-xk-E(x)*E(x2)]-E(a-S2)*xk-E(x2)-E(£3) 

E (y  -  -  y.  •  )  »  ECa^  +  f  x.+x,  +2  •  E  (x)  ]•  E  (  • 3 •£.)*[ x.  •  x,  +2  *  (x.  +x,  )  ’  E(x)  +E  (x“)  ]• 

IK.  IK  iklK 

■E(a“-  8“)  +  [  (xi'*'Xk)  •£(x“)+2,x.  -x^Efx)  ]’E(v£3)  + 

+x  -x.  -E(x2)-E(34)+  -•E(a2--2)+(x.+x,  ) •-’ E (a • £ • T2) + 
ik  m  l  k  n 

+x. -X.  •-•E(S'-T2)*v(i,k)TE(a2-c2)+E(x2)-E(£2>'72)  + 

i  Km 

+2-E(x)-E(a‘S-o2)+  -• E(j2 • T2) J 
m 

E(x'yk)  ■  E(x) • [E(a)+xk-E(S) ] 

E(x-z)  -  E(x)’E(a)+E(x2)*E(8) 

E(x-yk-z)  -  E(x)'E(a2)+[E(x2)+xk-E(x)]-E(cS)+xk-E(x2)-E(62)  . 


Let  us  now  assume  that  ct  and  6  are  exactly  known,  i.e., 


E(a)  «  a,  E(S)  =  3;  var(a)  =  var(S)  «  0 


Then  we  get 

a  I  a 

var(z)  *  var (a+8 • x+t • v)  ■  8“'var(x)+  — *E(t“) 


cov(x,z)  ■  covix.a+S'x)  »  6-var(x) 


a  I  a 

cov(x' z,z)-cov(x,z“)  *  -a- ;!•  var(x)+  — E(t“)‘E(x) 


E(x*yk)  *  E(x)‘E(yk)  etc.. 


and  therefore 

f (x, y)  =*  E(y  ) • [E(x) • (62. var (x)+  — • E(t“) ) +«• 3‘ var (x)-  — 'E(t') ’E(x) 
k  k  m  m 

-(a+3-E(x))-3'var(x)]  *  0  , 

i(x,yk'z)  -  E (y^) • [E(a‘ x+3’ x2)  • (3~ • var(x)+  E (t-) )+(a+3 • E(x) ) • (a • 3 ■ var (x) 

-  — -E(t-) • E(x) )-(3^- var (x)+  — •  E(i')  + (a+3 • E (x) ') • S- var (x) ]  =  0  . 
m  m 


As  the  system  of  equations  cor  the  d.Q,  d^,  i»l,...,n  is  homogeneous,  the 
system  has  the  trivial  solution 


d . _  *  d . .  »  0  for  i» I 
lO  1 1 


,n  . 


This  result  is  reasonable:  If  the  parameters  a  and  3  of  the  regression  line 
are  exactly  known,  one  does  not  need  the  y^  for  estimating  these  parameters. 

With  A“B»0  we  get 

,  cov(x,z)  3’var(x) 

d0,  * - *  — - i — r 

var(z)  8  *var(z)*  — E(t  ) 
m 


cov(x*  z,z)-cov(x,z~) 


00 


-a-3-var(x)+  — E(r~)-E(x) 
_ m _ 

3“'var(z)+  —  •E(t4') 

TTl 


var (z) 
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Therefore  x  is  estimated  by 


-a-S’var(x)+  — *E(t“) *E(x) +£' var(x)-z 

XQ  *  d00  +  d0l‘z  72  ,  .  I  ,  2 

x  6  ‘var(z)+  — *E(t  ) 

m 


This  estimate  can  be  written  in  the  following  intuitive  form 
I  ,  ! 


A 

XQ 


E(x)  + 


1  + 


ir  3"  •  var  (x) 
E(t2) 


1  + 


E(t2) 


2  l 

s 


m*S“‘var(x) 


which  can  be  interpreted  as  follows:  If  the  apriori  information  on  x  is 

1  2  2 

much  better  than  the  measurement  uncertainty,  i.e.,  if  — •E(T-)>>B~-var(x) , 

m 

then  x  is  simply  estimated  by  the  apriori  information-  In  the  opposite  case 
x  is  estimated  by  inverting  the  regression  line. 
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CONCLUSION'S 


Four  estimators  of  practical  importance  were  considered,  the  maximum 
likelihood  estimator  x^,  the  inverse  regression  estimator  x^,  the  two  stage 

estimator  x  ,  and  the  quadratic  estimator  x..  All  of  them  are  linear  in  the 

-  A  Q 

mean  value  z,  i.e.,  all  have  a  shape  x*Cq+Cj'z.  The  coefficients  c^  and  C| 
depend  on  the  observation  of  yj,....y  and  the  apriori  information.  Since 
Che  expectation  value  of  does  not  exist,  its  relevance  as  a  point  esti¬ 
mator  seems  doubcful.  All  other  estimators  have  their  own  merits  and  short 
comings.  The  estimator  x^  is  easily  calculable  but  up  to  now  justified  as  a 
Bayesian  estimator  only  for  special  a-priori  distribution  functions,  x,  , 
uses  only  the  first  and  second  a-priori  moments  instead  of  the  whole  a- 
priori  distribution.  The  numerical  expenditure  is  substantially  higher  as 

with  x..  The  estimator  x. ,  however  needs  further  theoretical  investigation. 

1  AJ 

The  quadratic  estimator  is  the  only  one  which  has  been  derived  as  a  solu¬ 
tion  of  a  risk  minimizing  problem.  It  is  the  only  one  which  is  linear  in  the 
observation  y  ,...,y  .  Its  confidence  region  and  sequential  properties  have 
not  been  investigated  as  yet.  Furthermore  even  more  computation  effort  is 
needed  as  for  the  other  estimators.  In  addition  the  required  knowledge  of 
the  third  and  fourth  moments  of  Che  apriori  distribution  requires  an  in¬ 
creased  effort.  Whether  this  problem  can  be  circumvented  by  a  similar  "semi¬ 
minimax''  estimator  using  only  the  first,  two  apriori  moments  cannot  be 
answered  as  yet. 


So  far  only  a  few  numerical  calculation  have  been  performed.  They  in¬ 
dicated  that  the  four  different  methods  led  to  not  too  different  estimations 
of  x  however,  that  the  coefficients  c^  and  C|  of  the  linear  form  c^+C|-z  dif¬ 
ferent  substantially  depending  on  the  apriori  information.  Thus  it  seems  that 
considerable  numerical  work  is  required  in  order  to  get  a  feeling  for  the  use¬ 
fulness  of  the  various  approaches  under  given  circumstances. 


Contrary  to  the  fact  that  already  a  large  amount  of  research  effort 
has  been  invested  into  the  inverse  regression  problem,  only  a  few  results 
have  been  obtained  especially  if  more  general  nonlinear  estimators  are  con¬ 
sidered.  It  seems  that  the  scope  of  the  problem  of  inverse  linear  regression 
has  not  yet  been  understood. 
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